skip to main content


Search for: All records

Creators/Authors contains: "Zeng, Qingkai"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Taxonomies serve many applications with a structural representation of knowledge. To incorporate emerging concepts into existing taxonomies, the task of taxonomy completion aims to find suitable positions for emerging query concepts. Previous work captured homogeneous token-level interactions inside a concatenation of the query concept term and definition using pre-trained language mod- els. However, they ignored the token-level interactions between the term and definition of the query concepts and their related concepts. In this work, we propose to capture heterogeneous token-level interactions between the different textual components of concepts that have different types of relations. We design a relation-aware mutual attention module (RAMA) to learn such interactions for taxonomy completion. Experimental results demonstrate that our new taxonomy completion framework based on RAMA achieves the state-of-the-art performance on six taxonomy datasets. 
    more » « less
    Free, publicly-accessible full text available July 1, 2024
  2. null (Ed.)
    People are looking for complementary contexts, such as team members of complementary skills for project team building and/or reading materials of complementary knowledge for effective student learning, to make their behaviors more likely to be successful. Complementarity has been revealed by behavioral sciences as one of the most important factors in decision making. Existing computational models that learn low-dimensional context representations from behavior data have poor scalability and recent network embedding methods only focus on preserving the similarity between the contexts. In this work, we formulate a behavior entry as a set of context items and propose a novel representation learning method, Multi-type Itemset Embedding , to learn the context representations preserving the itemset structures. We propose a measurement of complementarity between context items in the embedding space. Experiments demonstrate both effectiveness and efficiency of the proposed method over the state-of-the-art methods on behavior prediction and context recommendation. We discover that the complementary contexts and similar contexts are significantly different in human behaviors. 
    more » « less
  3. null (Ed.)
    Software traceability establishes and leverages associations between diverse development artifacts. Researchers have proposed the use of deep learning trace models to link natural language artifacts, such as requirements and issue descriptions, to source code; however, their effectiveness has been restricted by availability of labeled data and efficiency at runtime. In this study, we propose a novel framework called Trace BERT (T-BERT) to generate trace links between source code and natural language artifacts. To address data sparsity, we leverage a three-step training strategy to enable trace models to transfer knowledge from a closely related Software Engineering challenge, which has a rich dataset, to produce trace links with much higher accuracy than has previously been achieved. We then apply the T-BERT framework to recover links between issues and commits in Open Source Projects. We comparatively evaluated accuracy and efficiency of three BERT architectures. Results show that a Single-BERT architecture generated the most accurate links, while a Siamese-BERT architecture produced comparable results with significantly less execution time. Furthermore, by learning and transferring knowledge, all three models in the framework outperform classical IR trace models. On the three evaluated real-word OSS projects, the best T-BERT stably outperformed the VSM model with average improvements of 60.31% measured using Mean Average Precision (MAP). RNN severely underperformed on these projects due to insufficient training data, while T-BERT overcame this problem by using pretrained language models and transfer learning. 
    more » « less
  4. Automatic construction of a taxonomy supports many applications in e-commerce, web search, and question answering. Existing taxonomy expansion or completion methods assume that new concepts have been accurately extracted and their embedding vectors learned from the text corpus. However, one critical and fundamental challenge in fixing the incompleteness of taxonomies is the incompleteness of the extracted concepts, especially for those whose names have multiple words and consequently low frequency in the corpus. To resolve the limitations of extraction-based methods, we propose GenTaxo to enhance taxonomy completion by identifying positions in existing taxonomies that need new concepts and then generating appropriate concept names. Instead of relying on the corpus for concept embeddings, GenTaxo learns the contextual embeddings from their surrounding graph-based and language-based relational information, and leverages the corpus for pre-training a concept name generator. Experimental results demonstrate that GenTaxo improves the completeness of taxonomies over existing methods. 
    more » « less
  5. In many regulated domains, traceability is established across diverse artifacts such as requirements, design, code, test cases, and hazards -- either manually or with the help of supporting tools, and the resulting trace links are used to support activities such as impact analysis, compliance verification, and safety inspections. Automated tracing techniques need to leverage the semantics of underlying artifacts in order to establish more accurate trace links and to provide explanations of links that have been created in either a manual or automated fashion. To support this, we propose an automated technique which leverages source code, project artifacts and an external domain corpus to generate a domain-specific concept model. We then use the generated concept model to improve traceability results and to provide explanations of the results. Our approach overcomes existing problems with deep-learning traceability algorithms, as it does not require a training set of existing trace links. Finally, as an initial proof-of-concept, we apply our semantically-guided approach to the Dronology project, and show that it improves over other tracing techniques that do not use a concept model. 
    more » « less